首页> 外文OA文献 >C3D: Mitigating the NUMA Bottleneck via Coherent DRAM Caches
【2h】

C3D: Mitigating the NUMA Bottleneck via Coherent DRAM Caches

机译:C3D:通过相干DRam缓存减轻NUma瓶颈

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Massive datasets prevalent in scale-out, enterprise, and high-performance computing are driving a trend toward ever-larger memory capacities per node. To satisfy the memory demands and maximize performance per unit cost, today’s commodity HPC and server nodes tend to feature multi-socket shared memory NUMA organizations. An important problem in these designs is the high latency of accessing memory on a remote socket that results in degraded performance in workloads with large shared data working sets.This work shows that emerging DRAM caches can help mitigate the NUMA bottleneck by filtering up to 98% of remote memory accesses. To be effective, these DRAM caches must be private to each socket to allow caching of remote memory, which comes with the challenge of ensuring coherence across multiple sockets and GBs of DRAM cache capacity. Moreover, the high access latency of DRAM caches, combined with high inter-socket communication latencies, can make hits to remote DRAM caches slower than main memory accesses. These features challenge existing coherence protocols optimized for on-chip caches with fast hits and modest storage capacity. Our solution to these challenges relies on two insights. First, keeping DRAM caches clean avoids the need to ever access a remote DRAM cache on a read. Second, a non-inclusive on-chip directory that avoids tracking blocks in the DRAM cache enables a light-weight protocol for guaranteeing coherence without the staggering directory costs. Our design, called Clean Coherent DRAM Caches (C3D), leverages these insights to improve performance by 6.4-50.7% in a quad-socket system versus a baseline without DRAM caches.
机译:在横向扩展,企业级和高性能计算中普遍存在的海量数据集,正在推动一个趋势,即每个节点的存储容量越来越大。为了满足内存需求并最大程度地提高单位成本的性能,当今的商用HPC和服务器节点倾向于采用多插槽共享内存NUMA组织。这些设计中的一个重要问题是访问远程套接字上的内存的高延迟,这会导致具有大量共享数据工作集的工作负载的性能下降。这项工作表明,新兴的DRAM缓存可通过过滤高达98%的内存来帮助缓解NUMA瓶颈。远程内存访问。为了有效地发挥作用,这些DRAM高速缓存必须对每个插槽专用,以允许对远程内存进行缓存,这带来了确保跨多个插槽和GB DRAM高速缓存容量的一致性的挑战。此外,DRAM高速缓存的高访问延迟与套接字间的高通信延迟相结合,可以使对远程DRAM高速缓存的命中速度比主存储器访问慢。这些功能对现有的一致性协议提出了挑战,这些协议针对具有快速命中和适度存储容量的片上缓存进行了优化。我们对这些挑战的解决方案依赖于两种见解。首先,保持DRAM高速缓存的整洁避免了需要在读取时访问远程DRAM高速缓存。其次,避免跟踪DRAM缓存中的块的非包容片上目录启用了轻量级协议,可确保一致性,而不会增加目录成本。我们的设计称为Clean Coherent DRAM缓存(C3D),利用这些见解将四路系统的性能提高了6.4-50.7%,相比之下,没有DRAM缓存的基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号